Blabla.
Explosion of computer vision applications.
In ecology in particular to do this and that. Make sure to cite latest review papers.
However, two communities bit disconnected despite efforts by people at the interface, and platforms like Wildlife Insights.
Might explain why papers seldom propose the entire workflow of deep learning and ecological question treatment.
Also difficult to apply for non specialists, different terminology but also Python vs. R.
And lack of reproducibility.
Here, show case complete workflow on a case study on predator prey interactions. Data retrieving (pix), deep learning, then quantification of spatial co-occurrence. We argue that maybe not worth spending valuable time tuning algo to increase perf by one or two percent, if ecological indicators not too much affected. Entirely in R with which ecologists are familiar, and entirely reproducible.
Lynx and its prey
PPP Lynx
Data collection
Main objective
Les collègues de l’OFB et des fédérations de chasse collectent des données dans l’habitat naturel des espèces qui nous intéressent, grâce à des pièges photos laissés à des endroits stratégiques. Il s’agit ici du lynx et de ses proies le chevreuil et le chamois. La méthode est non-invasive, autrement dit on n’a pas besoin de capturer physiquement les animaux. La difficulté est que l’on se retrouve avec des grandes quantités de photos auxquelles il faut associer une étiquette espèce.
I used transfer learning to fine-tune a pre-trained CNN (resnet50) using the annotated pictures from the Jura site. Then I compared the predictions from my new model for the pictures from the Ain site with the manual annotations for these pictures. Transfer learning was achieved with GPU machines.
We use the fastai package that provides R wrappers to fastai. The fastai library simplifies training of CNNs.
Expliquer principe général et les étapes ci-dessous. Below with CPU for reproducibility, subsample of picture datasets, only a few for automatic tagging. But results are proovided with GPU, more epochs and all pictures. Fully-trained model, all pictures, provided via Zenodo.
C’est là qu’entre en jeu le deep learning, de plus en plus utilisé en écologie, voir par exemple (???). L’idée est de nourrir les algorithmes avec des photos en entrée pour en sortie récupérer l’espèce qui se trouve sur la photo. Nous avons utilisé la librairie fast-ai qui repose sur le language Python et sa librairie Pytorch. Un avantage de cette librairie est qu’elle vient avec un package R fastai qui propose plusieurs fonctions pour l’utiliser.
Quels sont les résultats obtenus? Nous avons d’abord fait du transfer learning sur un site d’étude dans le Jura où nous avions des photos déjà étiquetées. Nous avons utilisé un modèle resnet50 déjà pré-entrainé. Nous arrivons à classifier le lynx, et ses proies, le chamoix et le chevreuil, avec un degré de certitude satisfaisant.
Ensuite, nous avons utilisé le modèle pour étiqueter automatiquement des photos prises avec des pièges installés sur un autre site, dans l’Ain. Ces photos ont aussi été étiquetées à la main, on connait donc la vérité.
In this section, we go through a reproducible example of the entire deep learning workflow, from data preparation, training of a model and its validation, to the automatic labeling of pictures. We used a subsample of 467 pictures from the original dataset in the Jura county, plus 14 pictures from the original dataset in the Ain county, to allow the training of our model with CPU on a personal computer. We provide all pictures and the models fully trained with GPU from Zenodo.
We first split the dataset of pictures in two datasets, a dataset for training, and the other one for validation.
We use the exifr package to extract metadata from pictures, get a list of pictures names and extract the species from these.
labels %>%
count(Keywords, sort = TRUE) %>%
# print(n = Inf) %>%
knitr::kable(caption = "Species considered, and number of pictures with these species in them.")| Keywords | n |
|---|---|
| humain | 143 |
| vehicule | 135 |
| renard | 58 |
| sangliers | 33 |
| chasseur | 17 |
| chien | 14 |
| lynx | 13 |
| chevreuil | 13 |
| chamois | 12 |
| blaireaux | 10 |
| chat | 8 |
| lievre | 4 |
| fouine | 1 |
| cavalier | 1 |
Then we pick 80\(\%\) of the pictures for training in each category, the rest being used for validation.
# training dataset
pix_train <- labels %>%
select(SourceFile, FileName, Keywords) %>%
group_by(Keywords) %>%
filter(between(row_number(), 1, floor(n()*80/100))) # 80% per category
# validation dataset
pix_valid <- labels %>%
group_by(Keywords) %>%
filter(between(row_number(), floor(n()*80/100) + 1, n()))Eventually, we store these pictures in two distinct directories named train and valid.
What is the sample size of these two datasets?
| dataset | category | n |
|---|---|---|
| training | humain | 114 |
| training | vehicule | 108 |
| training | chamois | 9 |
| training | blaireaux | 8 |
| training | sangliers | 26 |
| training | renard | 46 |
| training | chasseur | 13 |
| training | lynx | 10 |
| training | chien | 11 |
| training | chat | 6 |
| training | chevreuil | 10 |
| training | lievre | 3 |
| validation | humain | 29 |
| validation | vehicule | 27 |
| validation | chamois | 3 |
| validation | blaireaux | 2 |
| validation | sangliers | 7 |
| validation | renard | 12 |
| validation | chasseur | 4 |
| validation | lynx | 3 |
| validation | chien | 3 |
| validation | fouine | 1 |
| validation | chat | 2 |
| validation | chevreuil | 3 |
| validation | lievre | 1 |
| validation | cavalier | 1 |
We proceed with transfer learning using the pictures from the Jura county (or a subsample more exactly).
We first load pictures, apply standard transformations to improve training (flip, rotate, zoom, rotate, ligth transform).
Then we get the model architecture. For the sake of illustration, we use a resnet18 here, but we used a resnet50 to get the full results presented below.
Now we are ready to train our model. Again, for the sake of illustration, we use only 2 epochs here, but used 20 epochs to get the full results presented below. With all pictures and a resnet50, it took 75 minutes per epoch approximatively on a Mac with a 2.4Ghz Inter Core i9 8 cores processor and 64Go emory with CPU, and less than half an hour with GPU. On this reduced dataset, it took a bit more than a minute per epoch with CPU. Note that we save the model after each epoch for later use.
## epoch train_loss valid_loss accuracy error_rate time
## ------ ----------- ----------- --------- ----------- ------
## 0 2.495430 0.931346 0.708333 0.291667 01:25
## 1 1.634887 0.785826 0.791667 0.208333 01:23
## epoch train_loss valid_loss accuracy error_rate
## 1 0 2.495430 0.9313456 0.7083333 0.2916667
## 2 1 1.634887 0.7858264 0.7916667 0.2083333
We may dig a bit deeper in training performances by loading the best model, here model_1.pth, and display some metrics for each species.
## Sequential(
## (0): Sequential(
## (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
## (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (2): ReLU(inplace=True)
## (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
## (4): Sequential(
## (0): BasicBlock(
## (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## (1): BasicBlock(
## (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (5): Sequential(
## (0): BasicBlock(
## (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (downsample): Sequential(
## (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
## (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (1): BasicBlock(
## (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (6): Sequential(
## (0): BasicBlock(
## (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (downsample): Sequential(
## (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
## (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (1): BasicBlock(
## (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (7): Sequential(
## (0): BasicBlock(
## (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (downsample): Sequential(
## (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
## (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## (1): BasicBlock(
## (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (relu): ReLU(inplace=True)
## (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
## (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## )
## )
## )
## (1): Sequential(
## (0): AdaptiveConcatPool2d(
## (ap): AdaptiveAvgPool2d(output_size=1)
## (mp): AdaptiveMaxPool2d(output_size=1)
## )
## (1): Flatten(full=False)
## (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (3): Dropout(p=0.25, inplace=False)
## (4): Linear(in_features=1024, out_features=512, bias=False)
## (5): ReLU(inplace=True)
## (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
## (7): Dropout(p=0.5, inplace=False)
## (8): Linear(in_features=512, out_features=12, bias=False)
## )
## )
Four metrics are given.
Precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. It quantifies the ability of the classifier not to label as positive a sample that is negative. The question that this metric answers is of all pix that labeled as positive, how many actually were positive?
The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples. The question recall answers is: of all the pix that were positive, how many did we label?
The f1-score can be interpreted as a weighted average of the precision and recall, it reaches its best value at 1 and worst score at 0.
The f1-score weights recall more than precision by a factor of 1, meaning that recall and precision are equally important. f1-score is 2 * (recall * precision) / (recall + precision)
The support is the number of occurrences of each class in y_true (ground truth or correct target values).
macro avg: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
weighted avg: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance.
Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations, it is (tp + tn)/(tp + tn + fp + fn)
We may extract the categories that get the most confused.
## V1 V2 V3
## 1 sangliers renard 3
## 2 chamois chien 2
## 3 chat renard 2
## 4 blaireaux renard 1
## 5 chamois sangliers 1
## 6 chevreuil renard 1
## 7 chien chamois 1
## 8 chien renard 1
## 9 chien vehicule 1
## 10 humain renard 1
## 11 humain vehicule 1
## 12 lievre renard 1
## 13 lynx chat 1
## 14 renard chevreuil 1
## 15 sangliers lynx 1
## 16 vehicule humain 1
In this section, we show how to use our freshly trained model to label pictures that were taken in another study site in the Ain county.
First, we get the path to the pictures.
Then carry out the prediction, and compare to the truth.
| truth | prediction |
|---|---|
| lynx | chevreuil |
| roe deer | chamois |
| wild boar | sangliers |
Sur la base du nombre de faux négatifs (une photo sur laquelle on a un lynx mais on prédit une autre espèce) et de faux positifs (une photo sur laquelle on n’a pas de lynx mais on prédit qu’il y en a un), les résultats sont peu satisfaisants. Toutefois, la question est de savoir si le manque de précision nuit à l’inférence des interactions prédateur-proie. Pour ce faire, on a utilisé des modèles statistiques qui permettent d’inférer les co-occurrences entre espèces en tenant compte de la difficulté de les détecter sur le terrain. Ce sont les modèles d’occupancy développés par Rota et al. (2016) et implémentés dans R par Fiske and Chandler (2011).
On obtient les probabilités de présence du lynx, conditionnellement à la présence ou absence de ses proies (Figure 1). Il y a un léger biais dans l’estimation de la probabilité de présence du lynx sachant la présence de ses deux proies favorites quand on se fie à l’étiquetage automatique des photos. Etant donné que les différences ne sont pas énormes, l’écologue pourra décider de les ignorer au regard du temps gagné par rapport à un étiquetage à la main. Maintenant le biais est plus important sur la probabilité de présence du lynx sachant la présence du chevreuil et l’absence du chamois qui elle est sous-estimée.
Probabilités de présence du lynx, conditionnellement à la présence ou absence de ses proies. En rouge, avec les photos étiquetées à la main. En gris-bleu, avec les photos étiquetées automatiquement.
En conclusion, l’utilisation d’un modèle entrainé sur un site pour prédire sur un autre site est délicate. Il est facile de se perdre dans les dédales du deep learning, mais il faut garder le cap de la question écologique, et on peut accepter des performances moyennes des algorithmes si le biais engendré sur les indicateurs écologiques est faible. Malgré tout, on peut faire mieux, et nous développons actuellement des modèles de distribution d’espèce qui prendrait à la fois en compte les interactions et les faux positifs et faux négatifs. Pour aller plus loin avec le deep learning et l’analyse d’images, nous renvoyons vers Miele, Dray, and Gimenez (2021).
Summarise what we did.
Main lessons.
To be extended.
ANR. Folks who have labeled pix if not co-authors. MBB folks. Vincent Miele for his help along the way, and being an inspiration.
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] exifr_0.3.1 fastai_2.0.4 forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
## [9] tibble_3.1.3 ggplot2_3.3.5 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 sass_0.3.1.9001 jsonlite_1.7.2
## [4] carData_3.0-4 modelr_0.1.8 bslib_0.2.4
## [7] assertthat_0.2.1 highr_0.9 cellranger_1.1.0
## [10] yaml_2.2.1 pillar_1.6.1 backports_1.2.1
## [13] lattice_0.20-41 glue_1.4.2 reticulate_1.20
## [16] digest_0.6.27 ggsignif_0.6.1 rvest_1.0.0
## [19] colorspace_2.0-2 htmltools_0.5.1.9002 Matrix_1.3-2
## [22] plyr_1.8.6 pkgconfig_2.0.3 broom_0.7.6
## [25] haven_2.3.1 scales_1.1.1 openxlsx_4.2.3
## [28] rio_0.5.26 generics_0.1.0 car_3.0-10
## [31] ellipsis_0.3.2 ggpubr_0.4.0 withr_2.4.2
## [34] cli_3.0.1 magrittr_2.0.1 crayon_1.4.1
## [37] readxl_1.3.1 evaluate_0.14 fs_1.5.0
## [40] fansi_0.5.0 foreign_0.8-81 rstatix_0.7.0
## [43] xml2_1.3.2 data.table_1.14.0 tools_4.0.2
## [46] hms_1.1.0 lifecycle_1.0.0 munsell_0.5.0
## [49] reprex_1.0.0 zip_2.1.1 compiler_4.0.2
## [52] jquerylib_0.1.3 rlang_0.4.11.9001 grid_4.0.2
## [55] rstudioapi_0.13 rappdirs_0.3.3 rmarkdown_2.7
## [58] gtable_0.3.0 abind_1.4-5 DBI_1.1.1
## [61] curl_4.3.2 R6_2.5.0 lubridate_1.7.10
## [64] knitr_1.33 fastmap_1.1.0 utf8_1.2.2
## [67] stringi_1.6.2 Rcpp_1.0.7 vctrs_0.3.8
## [70] png_0.1-7 dbplyr_2.1.0 tidyselect_1.1.1
## [73] xfun_0.24
Fiske, Ian, and Richard Chandler. 2011. “unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance.” Journal of Statistical Software 43 (10): 1–23.
Lahoz-Monfort, José J, and Michael J L Magrath. 2021. “A Comprehensive Overview of Technologies for Species and Habitat Monitoring and Conservation.” BioScience. https://doi.org/10.1093/biosci/biab073.
Miele, Vincent, Stéphane Dray, and Olivier O. Gimenez. 2021. “Images, écologie et deep learning.” Regards sur la biodiversité, February. https://hal.archives-ouvertes.fr/hal-03142486.
Miele, Vincent, Gaspard Dussert, Bruno Spataro, Simon Chamaillé‐Jammes, Dominique Allainé, and Christophe Bonenfant. 2021. “Revisiting Animal Photo‐identification Using Deep Metric Learning and Network Analysis.” Edited by Robert Freckleton. Methods in Ecology and Evolution 12 (5): 863–73. https://doi.org/10.1111/2041-210X.13577.
Rota, Christopher T., Marco A. R. Ferreira, Roland W. Kays, Tavis D. Forrester, Elizabeth L. Kalies, William J. McShea, Arielle W. Parsons, and Joshua J. Millspaugh. 2016. “A Multispecies Occupancy Model for Two or More Interacting Species.” Methods in Ecology and Evolution 7 (10): 1164–73.